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Abstract 

We address the problem of recovering an n-vector from m linear measurements lacking sign 
or phase information. We show that lifting and semidcfinitc relaxation suffice by themselves for 
stable recovery in the setting of m = 0(n\ogn) random sensing vectors, with high probability. 
The recovery method is optimizationless in the sense that trace minimization is unnecessary. We 
further demonstrate that a simple algorithm of projection onto convex sets converges linearly 
toward the unique solution. 
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1 Introduction 

We study the recovery of a vector xo 6 W l or C™ from the set of phaseless linear measurements 

|(x ,Zi)| for i = 1, . . . ,m, 

where Zj G M n or C n are known random sensing vectors. Such amplitude-only measurements arise 
in a variety of imaging applications, such as X-ray crystallography, optics, and microscopy. We 
seek stable and efficient methods for finding xo using as few measurements as possible. 

This recovery problem is difficult because the set of real or complex numbers with a given mag- 
nitude is nonconvex. In the real case, there are 2 m possible assignments of sign to the m phaseless 
measurements. Hence, exhaustive searching is infeasible. In the complex case, the situation is even 
worse, as there are a continuum of phase assignments to consider. A method based of alternated 
projections avoids an exhaustive search but does not always converge toward a solution [6j[71[9]. 

In [3 [U [S], the authors convexity the problem by lifting it to the space of n x n matrices, 
where xx* is a proxy for the vector x. A key motivation for this lifting is that the nonconvex 
measurements on vectors become linear measurements on matrices. The rank-1 constraint is then 
relaxed to a trace minimization over the cone of positive semi-definite matrices, as is now standard 
in matrix completion [10]. This convex program is called PhaseLift in [1], where it is shown that 
xo can be found robustly in the case of random Zj, if m = 0{n log n). The matrix minimizer is 
unique, which in turn determines xo up to a global phase. 

The contribution of the present paper is to show that trace minimization is unnecessary in this 
lifting framework for the phaseless recovery problem. The vector xq can be recovered robustly by 
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an optimizationless convex problem: one of finding a positive semi-definite matrix that is consis- 
tent with linear measurements. We prove there is only one such matrix, provided that there are 
0(n log n) measurements. In other words, the phase recovery problem can be solved by intersecting 
two convex sets, without minimizing an objective. We show empirically that a simple algorithm of 
projection onto convex sets converges linearly (exponentially fast) toward the solution. 

In PP, the authors show that the complex phaseless recovery problem from random measure- 
ments is determined if m > An — 2 (with probability one). This means that the x satisfying 
|(x, Zj)| = |(xo,Zj)| is unique and equal to xo, regardless of the method used to find it. A corollary 
of the analysis in [4], and of the present paper, is that this property is stable under perturbations 
of the data, provided m = O(nlogn). This determinacy is in contrast to compressed sensing and 
matrix completion, where a prior (sparsity, low-rank) is used to select a solution of an otherwise 
underdetermined system of equations. The relaxation of this prior (l\ norm, nuclear norm) is then 
typically shown to determine the same solution. No such prior is needed here; the semi-definite 
relaxation helps find the solution, not determine it. 

The determinacy of the recovery problem over n x n matrices may be unexpected because there 
are n 2 unknowns and only 0(n log n) measurements. What compensates for the apparent lack of 
data is the fact that the matrix we seek has rank one and is thus on the edge of the cone of positive 
semi-definite matrices. Most perturbed matrices that are consistent with the measurements cease 
to remain positive semi-definite. In other words, the positive semi-definite cone X >; is "spiky" 
around a rank-1 matrix Xo- That is, with high probability, particular random hyperplanes that 
contain Xo and have large enough codimension will have no other intersection with the cone. 

The present paper does not advocate for fully abandoning trace minimization in the context of 
phase retrieval. The structure of the sensing matrices appears to affect the number of measure- 
ments required for recovery. Consider measurements of the form Xq<3?xo, for some <3?. Numerical 
simulations (not shown) suggest that 0(n 2 ) measurements are needed if $ is a matrix with Gaus- 
sian i.i.d. entries. On the other hand, it was shown in [10] that minimization of the nuclear norm 
constrained by Tr(X<3?) = Xq$xo recovers xoXq with high probability as soon as m = 0(n log n). 
Other numerical observations (not shown) suggest that it is the symmetric, positive semi-definite 
character of <& that allows for optimizationless recovery. 

The present paper owes much to [2], as our analysis is very similar to theirs. We wish to also 
reference the papers [I~3] , where phase recovery is cast as synchronization problem and solved via 
a semi-definite relaxation of max-cut type over the complex torus (i.e., the magnitude information is 
first factored out.) The idea of lifting and semi-definite relaxation was introduced very successfully 
for the max-cut problem in [8]. The paper [11] also introduces a fast and efficient method based on 
eigenvectors of the graph connection Laplacian for solving the angular synchronization problem. 
The performance of this latter method was further studied in [2]. 

1.1 Problem Statement and Main Result 

Let xo £ M n or C n be a vector for which we have the m measurements |(xo,Zj)| = \fbi, f° r 
independent sensing vectors Zj distributed uniformly on the unit sphere. We write the phaseless 
recovery problem for xq as 



where A : R n -)• R m is given by A(x)i = |(x,Zj)| 2 , and A(x ) = b. 

Problem (JTJ can be convexified by lifting it to a matrix recovery problem. Let A and its adjoint 
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be the linear operators 



A . n nxn ^ R m j* . R m _^ % nXn 

X i-)- {zj-Xzj}^!,...^, A \ y ^ AjZjZ*, 

where % nxn is the space ofnxn Hermitian matrices. Observe that ^4(xx*) = A(x) for all vectors 
x. Letting Xo = xoXq, we note that .A(Xo) = b. We emphasize that A is linear in X whereas A is 
nonlinear in x. 

The matrix recovery problem we consider is 

Find X y such that A(X) = b. (2) 

Without the positivity constraint, there would be multiple solutions whenever m < _ "We 

include the constraint in order to allow for recovery in this classically underdetermined regime. 

Our main result is that the matrix recovery problem ([2]) has a unique solution when there are 
0{n log n) measurements. 

Theorem 1. Let xo S W 1 or C n and Xo = xoXq. Let m > cnlogn for a sufficiently large c. With 
high probability, X = Xo is the unique solution to X >z and ^4(X) = b. This probability is at least 
1 — e _7 ~ ; for some 7 > 0. 

As a result, the phaseless recovery problem has a unique solution, up to a global phase, with 
0{n log n) measurements. In the real- valued case, the problem is determined up to a minus sign. 

Corollary 2. Let xo £ W 1 or C n . Let m > cnlogn for a sufficiently large c. With high probability, 
{e^xo} are the only solutions to A(x) = b. This probability is at least 1 — e _7 ~, for some 7 > 0. 

Theorem Q] suggests ways of recovering xo. If an X G {X >z 0} n {X | A(X.) = b} can be found, xo 
is given by the leading eigenvector of X. See Section [6] for more details on how to find X. 

1.2 Stability result 

In practical applications, measurements are contaminated by noise. To show stability of optimiza- 
tionless recovery, we consider the model 

A(x) + v = b, 

where v corresponds to a noise term with bounded £2 norm, ||^||2 < £■ The corresponding noisy 
variant of ([T]) is 

Find x such that \\A(x) - b|| 2 < e||x ||l. (3) 

We note that all three terms in ([3]) scale quadratically in x or xo. 

Problem ([3|) can be convexified by lifting it to the space of matrices. The noisy matrix recovery 
problem is 

Find X h such that P(X) - b|| 2 < e||Xo|| 2 - (4) 

We show that all feasible X are within an 0(e) ball of Xo provided there are 0(n log n) measure- 
ments. 
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Theorem 3. Let xo £ W l or C n and Xo = xoXq. Let m > cn log n for a sufficiently large c. With 
high probability, 

X y and P(X) - b|| 2 < e||Xo|| 2 => ||X - X || 2 < Ce||X || 2 , 
/or some C > 0. T/us probability is at least 1 — e _7 ~, /or some 7 > 0. 
As a result, the phaseless recovery problem is stable with 0(n log n) measurements. 
Corollary 4. Zei xq £ M n or C n . Lei m > cn log n for a sufficiently large c. With high probability, 



P(x) -b|| 2 < e||so||! 



e ^x 



< Ce||x || 2 , 



for some 4> £ [0, 2ir), and for some C > 0. This probability is at least 1 — e 7 « , /or some 7 > 0. 

Theorem [3] ensures that numerical methods can be used to find X. See Section [B] for ways of finding 
X £ {X y 0} n {v4(X) ~ b}. As the recovered matrix may have large rank, we approximate xo 
with the leading eigenvector of X. 



1.3 Organization of this paper 

In Section [21 we prove a lemma containing the central argument for the proof of Theorem (TJ Its 
assumptions involve £i-isometry properties and the existence of an inexact dual certificate. Section 
12. 31 provides the proof of Theorem [1] in the real-valued case. It cites [3] for the ^i-isometry properties 
and Section for existence of an inexact dual certificate. In Section [3] we construct an inexact dual 
certificate and show that it satisfies the required properties in the real- valued case. In section [J] we 
prove Theorem on stability in the real- valued case. In Section [SI we discuss the modifications in 
the complex- valued case. In Section [6l we present two methods for performing the matrix recovery 
numerically. We simulate them to establish stability empirically. 



1.4 Notation 

We use boldface for variables representing vectors or matrices. We use normal typeface for scalar 
quantities. Let Zn- denote the A:th entry of the vector Zj. For two matrices, let (X, Y) = Tr(Y*X) 
be the Hilbert-Schmidt inner product. Let cr, be the singular values of the matrix X. We define 
the norms 




In particular, we write the Frobenius norm of X as ||X|| 2 . We write the spectral norm of X as ||X||. 

An n-vector x generates a decomposition of M. n or C n into two subspaces. These subspaces 
are the span of x and the span of all vectors orthogonal to x. Abusing notation, we write these 
subspaces as x and x . The space of n-by-n matrices is correspondingly partitioned into the four 
subspaces x ® x, x ® x^, x^ ® x, and x^ ® x -1- , where ® denotes the outer product. We write 
T x for the set of symmetric matrices which lie in the direct sum of the first three subspaces, namely 
T x = {xy* +yx* | y £ R n or C n }. Correspondingly, we write for the set of symmetric matrices 
in the fourth subspace. We note that is the orthogonal complement of T x with respect to the 
Hilbert-Schmidt inner product. Let ei be the first coordinate vector. For short, let T = T ei and 
T 1 - = T^.. We denote the projection of X onto T as either "PrX or Xy. We denote projections 
onto T 1 - similarly. 

We let I be the n x n identity matrix. We denote the range of A* by 1Z(A*). 
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2 Proof of Main Result 



Because of scaling and the property that the measurement vectors z, come from a rotationally 
invariant distribution, we take xq = ei without loss of generality. Because all measurements scale 
with the length ||Zj||2, it is equivalent to establish the result for independent unit normal sensing 
vectors Zj. To prove Theorem [TJ we use an argument based on inexact dual certificates and i\- 
isometry properties of A. This argument parallels that of [3]. We directly use the £i-isometry 
properties they establish, but we require different properties on the inexact dual certificate. 

2.1 About Dual Certificates 

As motivation for the introduction of an inexact dual certificate in the next section, observe that 
if A is injective on T, and if there exists a (exact) dual certificate Y £ 1Z(A*) such that 

Y T = and Y T ± >- 0, 

then Xo is the only solution to A(X.) = b. This is because 

= (X - X , Y) = (X T x, Y T x) =>- X T x = X = X , 

where the first equality is because Y 6 1Z(A*) and «4(X) = .A(Xo). The last implication follows 
from injectivity on T. 

Conceptually, Y arises as a Lagrange multiplier, dual to the constraint X >; in the objective- 
less "optimization" problem 

min such that A(X) = b, X >z 0. 

Dual feasibility requires Y y 0. As visualized in Figure [Th., Y acts as a vector normal to a 
codimension-1 hyperplane that separates the lower-dimensional space of solutions {„4(X) = b} 
from the positive matrices not in T. The condition Y T ± y is further needed to ensure that this 
hyperplane only intersects the cone along T, ensuring uniqueness of the solution. 

The nullspace condition Yt = is what makes the certificate exact. As Y E 1Z(A*), Y must 
be of the form ^ AjZjZ?. The strict requirement that Yj- = would force the Aj to be complicated 
(at best algebraic) functions of all the zj, j = 1, ... ,m. We follow [1] in constructing instead an 
inexact dual certificate, such that Yy is close to but not equal to 0, and for which the Aj are more 
tractable (quadratic) polynomials in the Zj. A careful inspection of the injectivity properties of A, 
in the form of the RIP-like condition in [3] , is what allows the relaxation of the nullspace condition 
on Y. 

2.2 Central Lemma on Inexact Dual Certificates 

With further information about feasible X, we can relax the property that Yy is exactly zero. In 
[1], the authors show that all feasible X lie in a cone that is approximately {||X T ± ||i > ||Xy — Xo||}, 
provided there are 0(n) measurements. As visualized in Figure [lb, Y acts as a vector normal to 
a hyperplane that separates Xo from the rest of this cone. The proof of Theorem [T] hinges on the 
existence of such an inexact dual certificate, along with £i-isometry properties that establish X is 
in this cone with high probability. 

Lemma 1. Suppose that A satisfies 

m" 1 p(X)|| 1 < (1 + 5)11X11! forallXhO, (5) 
m~ 1 p(X)|| 1 > 0.94(1 - 5)\\X\\ forallXeT, (6) 
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Exact dual certificate 



Inexact dual certificate 



Figure 1: The graphical interpretation of the exact and inexact dual certificates. The positive axes 
represent the cone of positive matrices. The thick gray line represents the solutions to A(X.) = b. 
The exact dual certificate Y is a normal vector to a hyperplane that separates the space of solutions 
from positive matrices. When the dual certificate is inexact, we use the fact that ^i-isometry 
properties imply X is restricted to the cone ([8]). The inexact dual certificate Y is normal to a 
hyperplane that separates Xo from the rest of this restricted cone. As shown, the hyperplane 
normal to Y does not separate Xq from positive matrices. 



for some 5 < 1/9. Suppose that there exists Y G TZ(A*) satisfying 

||Y T ||i < 1/2 and Y T ± y I T ±. (7) 
Then, Xo is the unique solution to ([2]). 

Proof of LemmaU\ Let X solve ([2]), and let H = X — Xo- We start by showing, as in [3], that the 
^i-isometry conditions © -([6]) guarantee solutions lie on the cone 

0.94(1-*) 

||*i T x||i> — — — (p) 

This is because 

0.94(1 -<5)||H T || ^m^H^Hr)!!! =m- 1 ||A(H r x)|| 1 < (1 + S)\\U T± 

where the equality comes from = .A(H) = A(Ht) + .A(H r ±), and the two inequalities come from 
the £i-isometry properties ((5j)— ([6]) and the fact that H T ± ^ 0. 
Because A(H) = and Y G K(A*), 

= (H,Y) 
= (H T ,Y T ) + (H T x,Y T x) 

> HHtj.Hi- i||H T || (9) 

where ([9]) and (fT0|) follow from ((7J) and ((SJ), respectively. Because the constant in (flQ|) is positive, 
we conclude Hy = 0. Then, ([9]) establishes H T x =0. □ 
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2.3 Proof of Theorem Q] and Corollary [2] 

We use Lemma [T] to prove Theorem [T] for real- valued signals. 

Proof of Theorem [7J We need to show that (|5]H(|7|) hold with high probability if m > cn log n for 
some c. Lemmas 3.1 and 3.2 in [3] show that © and (|6|) both hold with probability of at least 
1 — 3e~ 7im provided m > c\n for some c\. In section[3j we construct Y € 1Z(A*). As per Lemma[2l 
||Yr||i < 1/2 with probability at least 1 — e _72m / n if m > C2n. As per LemmaEl ||Y T x — 2I r ±|| < 1 
with probability at least 1 — 2e~" /2Tn ^ logn if m > c-jnlogn. Hence, ~Y T ± y It x with at least the 
same probability. Hence, all of the conditions of Lemma Q] hold with probability at least 1 — e _7m / n 
if m > cn log n for some c and 7. □ 

The proof of Corollary [2] is immediate because, with high probability, Theorem Q] implies 

A(xi) = A(xo) => xixj = x x$ xi = e^x . 



3 Existence of Inexact Dual Certificate 

To use Lemma [1] in the proof of Theorem [H we need to show that there exists an inexact dual 
certificate satisfying ([7]) with high probability. Our inexact dual certificate vector is different from 
that in [3], but we use identical tools for its construction and analysis. We also adopt similar 
notation. 

We note that ^4*^4(X) = ^j(X, Zjz|)zjZ*, which can alternatively be written as 



i=l 



We let S = E[zjZ* <S> ZjZ*]. The operator S is invertible. It and its inverse are given by 



5(X) = 2X + Tr(X)I, 
1 



n + 2 



-Tr(X)I 



We define the inexact dual certificate 



1 m 

m ^-^ 



where 



n + 2' 



i=l 



Z7Z,; 



Ei = {Kil < V2/31ogn} n {\\zi\\ 2 < VM- 
Alternatively, we can write the inexact dual certificate vector as 

Y = —A* (1 E o ^"^(I - eie*)) , 
m 



(11) 
(12) 

(13) 
(14) 

(15) 



where (1_b)« = lfi 4 and o is the elementwise product of vectors. In our notation, truncated quantities 
have overbars. We subsequently omit the subscript % in Zj when it is implied by context. 
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3.1 Motivation for the Dual Certificate 

For ease of understanding, we first consider a candidate dual certificate given by 

Y = —A*AS- 1 2(I-e 1 e* 1 ). 

m 

The motivation for this candidate is twofold: Y £ 7Z(A*), and Y ~ 2(1 — eie^) as m — > oo because 
E[y4*v4] = mS. In this limit, Y becomes an exact dual certificate. For finite m, it should be close 
but inexact. We can write 

y = -£y«, 

m 

i 

where Yj is the independent sample of the random matrix 

3 „ „, 



nT2 l|z " 2 Zl 



zz 



corresponding to Zj. Because the vector Bernstein inequality requires bounded vectors, we truncate 
the dual certificate in the same manner as [3J. That is, we consider l^Yj, completing the derivation 

of trm 



3.2 Bounds on Y 

We define vr(/3) = W(E C ). In gj, the authors provide the bound vr(/3) < P(|zi| > ^2(j logn) + 
P(||z||| > 3n) < n-l 3 + e"™/ 3 , which holds if 2/3 logn > 1. 

We now present two lemmas that establish that Y is approximately 2(1 — eie*), and is thus an 
inexact dual certificate satisfying (|7|). 

Lemma 2. Let Y be given by (|12p . There exists positive 7 and c such that for sufficiently large n 
if m > cn. 

Lemma 3. Let Y be given by (|12p . There exists positive 7 and c such that for sufficiently large n 

— ( Tfh \ 

P(||Y T ± -2I T x|| > 1) < 2exp -7- 

Ml 11/ ^ I Q g n J 

if m > cnlogn. 

3.3 Proof of Lemma [2 Y on T 

We prove Lemma [2] in a way that parallels the corresponding proof in |3J. Observe that 

||Y T ||i < v^||Y T ||2 < 2||Y r ei|| 2 , 

where the first inequality follows because Y^ has rank at most 2, and the second inequality follows 
because Y7 1 can be nonzero only in its first row and column. We can write 



. m 

Y T ei = -Vyi, 

nm — ■* 



m 

i=l 
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where y» = y^l^, and yj are independent samples of 

3 N „ II 2 2 

z ll2 z l 



n + 2' 

To bound the 1% norm of Y^ei, we use the Vector Bernstein inequality on y^. 

Theorem 5 (Vector Bernstein inequality). Let Xj be a sequence of independent random vectors 
and set V > X^^ll x ill2- Then for all t < V/ max||xj||2, we have 



E< 



Xi - Ex,, 



> Vv + t) <e-' 2 /4V. 



In order to apply this inequality, we need to compute max ||y||2, Ey, and E||y|| 2 . 

First, we compute max||y|| 2 . On the event E, \z±\ < \ogn and ||z||2 < y/3n. If n is large 
enough that 2/3 log n > 9, then |£| < 2/3 log n. Thus, 



lylb 



< v / 24n"(/31ogn) 3 / 2 



for sufficiently large n. 

Second, we find an upper bound for Ey. Note that Eyi = because 

E[zf) = 3, 
E[^||z|||] =n + 2. 

By symmetry, every entry of y has zero mean except the first. Hence, 

Py|| 2 = \m\ = " = |Eyil E c| < ^(E c )^yl = a/^^% 2 . 
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Computing, 



we find 

where we have used 



n + 2 

,2 



(n + 2) : 



2 Z l \\ Z \\2i 



m < 44, 



E[zf] = 105, 
E[z?||z|||] = 15n + 90, 
E[z 4 ||z||^] = 3n 2 + 30n + 72. 



Thus, 



||Ey|| 2 < y44(n"^ + e" n / 3 ). 
Third, we find an upper bound for E||y|| 2 . Because ||y|| 2 < ||y|| 2 , we write out 



t-2 2 1 1 1 1 ^ 

yL = C z i\\ z \\2 = z i\\ z \ 



6 



n + 2 



4|| ||4 , 
z l\\ z \\2 + 



(n + 2) 2 1 



z 2 IMI 6 
Zi iizii 2 . 



(16) 
(17) 
(18) 



(19) 
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Hence, 



E[||y|||] = (15n + 90) - — ^(3?i 2 + 30n + 72) + - —^ {n + 2)(n + 4)(n + 6) (20) 



n + 2 



(n + 2) 2 



< 8n + 16, 
where we have used (fT7|) . (fl~8j) . and 

E [*i IMli] = (n + 2)(n + 4)(n + 6). 



(21) 



(22) 



Applying the vector Bernstein inequality with V = m(8n + 16), we have that for all t < (8n + 
16)/[\/24^(/31ogn) 3 / 2 ], 



1 

m 



'871 + 16 \ 
> \ I h t\ < exp 



mt 



4(8n + 16) 



Using the triangle inequality and (fl~9l) . we get 



i 



> v/44(n~/ 3 + e- n / 3 ) + A / n+ 16 + i | < exp 



mt 



4(8n + 16) 

Lemma [2] follows by choosing t, f3, and m > cn where n and c are large enough that 

r~ — " TT. /8n + 16 1 

^te(n-P + e-/3) + yj —— + t<-. 

3.4 Proof of Lemma St YonT 1 

We prove Lemma [3] in a way that parallels the corresponding proof in [Ij. We write 

Y r x - 2I T x = - V(W+ Bi - 2I T xl E c), 



where Wj are independent samples of 



W, 



_n + 2 

We decompose Wj into the three terms 
Wj = -[z 2 -l]7V(zz*) + 3 



V T ±(zz*) -2I T j 



1 



ra + 2 1 



P T x(zz*) + 2(P T xzz* -I T j 



W 



(0) 



wW + wf. 



Letting W^'^ = W^l^, it suffices to show that with high probability 



1 

m 



2^2I T xl S c 



1 , 1 

< - and, — 
4 m 



(/v) 



< - for k = 0, 1, 2. 
~ 4 



(23) 

(24) 
(25) 

(26) 
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3.4.1 Bound on I t ±1e c 

We show that m _1 || ^jIt-l1.e?|| = m -1 ^ ^Ef is small with probability at least 1 — 2e _7m for 
some constant 7 > 0. To do this, we use the scalar Bernstein inequality. 

Theorem 6 (Bernstein inequality). Let {X{\ be a finite sequence of independent random variables. 
Suppose that there exists V and c such that for all Xi and all k > 3, 



^E|Xj| fe < -k\Vc k ~ 2 . 



Then for all t > 0, 



>t \ < 2 exp 



t- 



2V + 2c t 



(27) 



Observing that E|l^c| fc = K1e c = tt(/3), we apply the Bernstein inequality with V = 7r(/5)m and 
cq = 1/3. Thus, 



[ — Vl^-7r(/3) >t) <2exp 

V m i 1 J ' 



mt z 



2tt(/3) + 2t/3 



Using the triangle inequality and taking t and (5 such that 7r(/3) + 1 < 1/8 for sufficiently large n, 
we get 



1 



> — I < 2 exp (—7m) 



for a 7 > 0. 

3.4.2 Bound on W<°) 

We show m _1 1| Y2i || is small with probability at least 1 — 2 exp(— 7/ log n). We write this norm 
as a supremum over all unit vector perpendicular to ei: 



sup 

u_Lei,||u||=l 



£<u,wfu 



(28) 



To control the supremum, we follow the same reasoning as in @]. We bound Ei( u > w f )u ) for 
fixed u and apply a covering argument over the sphere of u's. We write 

£<u,xfu)=J>li,, 



where rji are independent samples of 

m = - l z l - !] ( z > u > 2 - 

To apply the scalar Bernstein inequality, we compute IE| 771^ | fc . Because u _L ei, z\ and (z, u) are 
independent. Hence, 

n^ E \ k <n^\-^E\ k nM\ 2k ■ 

Bounding the first factor, we get 

E\{zf - l)l E \ k = E|(z 2 - l) k ~ 2 l E (z 2 - 1) 2 | < (2/31ogn) fc " 2 E(z 2 - l) 2 = 2(2/3 log n) k ~ 2 . 



11 



Observing that (z, u) is a chi-squared variable with one degree of freedom, we have 

E|(z,u)| 2fc = 1 x 3 X ... X (2k- 1) < 2 k k\ 
Applying the scalar Bernstein inequality with V = 16m and cq = 4/3 log n, we get 



1 

m 



>t \ < 2 exp 



mt 



2(16 + 4/3* logn) 



Because Krji = 0, we get 



\E m l Ei \ = \Ernl Ef \ < ^F(Ef)^/Erif = 2^^CS), 
where we have used E(l — z 2 ) 2 = 2, and E|(z,u)| 4 = 3. Hence, 

mt 2 



1 

m 



> i + 2 v / ^3) < 2 



exp 



2(16 + 4/3i log n) 

Taking t, f3,m > c\n with n large enough so that t + 2y / 7r(/3) < 1/8, we have 



1 

m 



for some 7' > 0. To complete the bound on (|28p . we use Lemma 4 in [12j : 



sup 
u 



(u,W(°!u) 



< 2 sup 



(u,W^u) 



where A/1/4 is a 1/4-net of the unit sphere of vectors u 1 ei. As IA1/4I < 9™, a union bound gives 



1 

m 



> 1/8 ) <9 n -2expf- 7 '-^-^ 
J \ logn J 



Hence, 



1 

m 



(0) 



> ^ I < 2 exp (—7m/ logn) 



for some 7 > 0. 

3.4.3 Bounds on and W^ 2 ) 

The bound for the || ^2% ^ II term is similar. We write 



where r\i are independent samples of 



Vi = 3 



n + 2 



(z,u)< 



12 



We can bound E\rjil E \ k < 12 k k\ because ||z||2 < 3n on E. Applying the scalar Bernstein inequality 
with cq = 12 and V = 288m gives 



1 

m 



>t] < 2 exp 



mr 



2(288 + 12*) 



The rest of the bound is similar to that of || ^jX(°)|| above. 
Finally, we also bound || X]j similarly. We write 



where r\i are independent samples of 



Observing that 



r ?i = 2(z,u) 2 -2. 



E\vdE\ k < 4 fc /c!, 

we apply the scalar Bernstein inequality with cq = 4 and V = 32m, giving 

a 



l 

m 



^2vdE, -E[77il fi .] 

i 

The rest of the bound is as above. 



>t \ < 2 exp 



mt 



2(32 + At) J ' 



4 Stability 

We now prove Theorem [31 establishing the stability of the matrix recovery problem . We also 
prove Corollary HI establishing the stability of the vector recovery problem ([3|) . As in the exact 
case, the proof of Theorem [3] hinges on the £i-isometry properties ([H])-© and the existence of an 
inexact dual certificate satisfying ([7|). For stability, we use the additional property that Y = „4*A 
for a A controlled in l<i- It suffices to establish an analogue of Lemma Q] along with a bound on 
l|A|| 2 . 

Lemma 4. Suppose that A satisfies ([5]) - ([6]) and there existsY = A*X satisfying (|7|) and \\X\\\ < 5. 
Then, 

X y and \\A(X) - b|| 2 < e||X || 2 =► ||X - X || 2 < Ce||Xo|| 2 , 

for some C > 0. 

Proof of Lemma\J^ As before, we take xo = ei and Xo = eie^ without loss of generality. Consider 
any X y such that ||^4(X) — b|| 2 < e, and let H = X — Xo- Whereas ^4(H) = in the noiseless 
case, it is now of order e because 

\\A{B)h < \\A(X - b)|| 2 + P(X - b)|| 2 < 2e. (29) 

Similarly, |(H, Y)| is also of order e because 

|(H,Y)| = |(A(H),A>|| < ||^(H)||oo l|A||i < \\A(H)h WMh < 10e. 
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Analogous to the proof of Lemma Q3 we use (|7|) to compute that 

lOe > (H, Y) > IIHrJ.Hi - -||H T ||. 
Using the ^i-isometry properties © - ([6]), we have 

0.94(1 - S)\\H T \\ < m^P(HT)||i < m^H^H)!!! +m- 1 ||^(H T x)|| 1 



(30) 



<m- 1 / 2 ||^(H)|| 2 + (l + <5)||H T 
< 26m- 1 / 2 + (1 + <J)||H T ±||i. 



x i 



Thus ([3U|) becomes 



10 + 



m 



-1/2 



e> 1 



1 + 5 



IH 



T-MIl) 



0.94(1 - 5) J " - V* 2 • 0.94(1 - 5) 
which, along with (|3ip . implies 

||H T ±||i < CqS and ||Hr|| < Cie 
for some Cq,C\ > 0. Recalling that Hy has rank at most 2, 

||H|| 2 < ||H T || 2 + ||H T ±|| 2 < V2\\U T \\ + ||H T x||i < (V2d + C )e < Ce. 

4.1 Dual Certificate Property 



(31) 



(32) 



(33) 



□ 



It remains to show ||A||i < 5 for Y = A* A. From ([15]). we identify A = m^l^o AS _1 2(I-eie^)). 
Computing, 

||A||i = m^WlE o .AS _1 2(I - eie^Hi 
< m^HAS-^^-eie^Hi 



< m 



A 



<(1 + S) 
<4(l + 5), 



ra + 2" 
3 



^(eiej) 



n + 2 



+ eie 



llll 



(34) 
(35) 



where (|34p follows from (jlip . and (|35|) follows from the triangle inequality and the £i-isometry 
property ([5]). Hence ||A||i < 5. 

4.2 Proof of Corollary [4] 

Now we prove Corollary U showing that stability of the lifted problem implies stability of the 
unlifted problem ([3]). As before, we take xo = ei without loss of generality. Hence ||Xo|| 2 = 1. 
Lemma H] establishes that ||X — Xo|| < CqS. Recall that Xo = xoXq. Decompose X = ^ ■ AjV^v*- 
with unit-normalized eigenvectors v,- sorted by decreasing eigenvalue. By Weyl's perturbation 
theorem, 



max{|l - Ai|, |A 2 |, . . . , |A n |} < C e. 



(36) 
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Writing 

X - vivj = (X - X) + ^(Ai - ljvivj + J2 AiV,v| j , (37) 

we use the triangle inequality to form the spectral bound 

||Xo-v lV *|| < 2C e. 

Noting that 

1 - |(x , v)| 2 = |||Xo - VivJUl < |||Xo - vxv^ll 2 < 2Cle 2 , 

we conclude 

||x -v||l =2-2(xo,v) <4C 2 e 2 . 

5 Complex Case 

The proof of Theorems [U and [3] are analogous to the complex- valued cases. There are a few 
minor differences, as outlined and proved in [3]. The sensing vectors are assumed to be of the 
form Kzj ~ A/"(0,I) and Qz^ ~ A/"(0,I). The £i-isometry conditions for complex A have weaker 
constants . Lemma [1] becomes 

Lemma 5. Suppose that A satisfies 

m _1 ||^(X)||i < (1 + «5)||X||i for all X y 0, 

m _1 ||^(X)||i > 0.828(1 - 6)\\~K\\ for all X G T, 

for some 5 < 3/13. Suppose that there exists Y G 7^(^4*) satisfying 

||Yr||i < 1/2 and Y r x ^ I r x. 

Then, Xo is i/ie unique solution to ([2]). 

The proof of this lemma is identical to the real- valued case. The conditions of the lemma are 
satisfied with high probability, as before. 

The construction of the inexact dual certificate is slightly different because <S(X) = X + Tr(X)I 
and S^(X) = X - ^Tr(X)I. As a result 



Zj 2 2 \Zi 

n + 1 



The remaining modifications are identical to those in [1], and we refer interested readers there for 
details. 



6 Numerical Simulations 

We now present two procedures to find X from noisy measurements b. We then study each 
empirically through numerical simulation. In application, we may not have prior knowledge of e or 
||Xo||2- Hence, our recovery methods do not make use of either quantity. Two iterative procedures 
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for finding an X £ {X ^ 0} H {^4(X) ~ b} are projection onto convex sets (POCS) and projected 
gradient descent. 

For the POCS approach, we let 

X n+ i = 'Pp S d'P{^(x)=b} X n, 

where "P ps d is the projector onto the positive semi-definite cone of matrices, and 'P{_4(x)=b} is the 
projector onto the affine space of solutions to A(X.) = b. In the classically underdetermined case, 

, (n+l)n ., 

m < - — j-^— , we can write 

^M(x)=b}X = X - A*{AA*)~ 1 A{Tt) + A*(AA*)~ 1 b. 

In the critically-determined and overdetermined cases, we interpret 'P{^t(x)=b} as the least-squares 
solution to A(X) = b. In these cases, iteration is unnecessary. 
For the projected gradient descent approach, we let 

X„+i = V psd [X n - a/(i(X„) - b)] , 

where a = 10~ 4 . Notice that A*(A(X.) — b) is the gradient in X of ^||^4(X) — b|||. This approach 
relaxes the task of finding X y such that -4(X) = b to that of minimizing ||.4(X) - b||| subject 
to X y 0. 

With noisy data, it is possible that there are no positive semi-definite matrices that strictly 
agree with the measurements. That is, it may be that {X >^ 0} Pi {„4(X) = b} is empty. In this 
case, POCS will oscillate between a solution to ^4(X) = b and something positive. This failure 
mode should be most apparent when there are an equal number of measurements as unknowns, 
when m = ^"t 1 . In contrast, gradient descent will approach the positive matrix of least data 
misfit. 

For our simulations, we consider an xo 6 M n sampled uniformly from the unit sphere. We take 
independent, real-valued Zj ~ jV(0,I), and let the measurements be noisy with e = 1/10. We let n 
vary from 5 to 50 and let m vary from 10 to 250. We define the recovery error as ||X — X^o 1 1 2 / 1 1 ^Co 1 1 2 ■ 

Figure [2] shows the average recovery error under the POCS and projected gradient descent 
methods over a range of values of n and m. Each pair of values was independently sampled 
10 times, and both methods were run for 2000 iterations. The plot shows that the number of 
measurements needed for recovery is approximately linear in n, significantly lower than the amount 
for which there are an equal number of measurements as unknowns. 

In the classically underdetermined regime, achieving a given error with POCS requires slightly 
fewer measurements than with projected gradient descent under our choice of a. As guessed above, 
POCS gives large recovery errors in the critically determined case, along m ~ _ Projected 

gradient descent performs well in this case. 

Figure [3] shows recovery error versus iteration number under the POCS and projected gradient 
descent methods. It shows a single recovery for n = 40 and m = 250 in the noiseless and noisy 
cases. For both methods, convergence is initially linear. In the presence of noise, the convergence 
tapers off around the noise level. For the chosen value of a, projected gradient descent converges 
with a slower rate than POCS, though each POCS iteration is more expensive. In our experiments 
the runtimes of both methods were comparable. 

References 

[1] R. Balan, P. Casazza, and D. Edidin. On signal reconstruction without noisy phase. Appl. 
Comp. Harm. Anal, 20:345-356, 2006 



16 



POCS 



Gradient Descent 




50 100 150 200 250 
Number of measurements (m) 



































50 100 150 200 250 
Number of measurements (m) 



Figure 2: Recovery error for the POCS and projected gradient descent solutions to the noisy matrix 
recovery problem (jU) as a function of n and m. In these plots, e = 10 _1 . Black represents an average 
recovery error of 100%. White represents zero average recovery error. Each block corresponds to 
the average of 10 independent samples. The solid curve depicts when there are the same number of 
measurements as degrees of freedom. The number of measurements required for recovery appears 
to be roughly linear, as opposed to quadratic, in n. The POCS algorithm has large recovery errors 
near the curve where the number of measurements equals the number of degrees of freedom. 
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Figure 3: The relative error versus iteration number for the noiseless and noisy matrix recovery 
problems, ([2]) and (|U), under the POCS and projected gradient descent methods. As shown, n = 40 
and the number of measurements is m = 250. As expected, convergence is exponential until it 
saturates due to noise. 
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